AITopics | version control

Collaborating Authors

version control

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Lake: a Lakehouse for Deep Learning

Hambardzumyan, Sasun, Tuli, Abhinav, Ghukasyan, Levon, Rahman, Fariz, Topchyan, Hrant, Isayan, David, McQuade, Mark, Harutyunyan, Mikayel, Hakobyan, Tatevik, Stranic, Ivo, Buniatyan, Davit

arXiv.org Artificial IntelligenceDec-13-2022

Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage increases, traditional data lakes are not well-designed for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2209.10785

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MLOps vs. DevOps: What are the Similarities and Differences?

#artificialintelligenceJul-22-2022, 08:12:13 GMT

You've almost certainly heard of DevOps before, especially if you work in the tech world--but you may or may not have heard of MLOps. A newer development in the machine learning world, MLOps is quickly taking hold due to its effective translation of classic DevOps principles. While these two disciplines are related, as the similar names indicate, they also have some key differences that you should know about. DevOps is short for software development and IT operations, and the term encompasses both the practices and tools that comprise DevOps as well as the cultural mindset behind them. DevOps represented a major shift in the IT world in the 2010s, moving away from slow, complicated processes toward faster and more iterative development.

devops, machine learning, mlop, (15 more...)

#artificialintelligence

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Top Ten Open Source MLOPS Tools Every Software Developer Should Be Aware Of

#artificialintelligenceMay-28-2022, 02:39:55 GMT

Given the ever-changing needs of ML projects, it is considered safe to use open source MLOps tools. ML models are easy to design when the only factor to consider is the ability to predict the outcome. Continuous learning, considered as the fundamental step towards artificial intelligence, is achieved by redesigning the ML models used for training. With millions upon millions of bytes of data involved and tasks spread across multiple computers, it becomes a futile chase when it comes time to debug or adapt changed parameters. To build scalability, flexibility, and retractability into an ML model, developers often opt for MLOps frameworks.

mlop tool, open source mlop tool, workflow, (13 more...)

#artificialintelligence

Industry:

Information Technology > Services (0.50)
Information Technology > Software (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Things You Need To Know About Data Science

#artificialintelligenceApr-17-2022, 15:20:45 GMT

The area of data science is large and fast expanding. It's no surprise that so many people want to learn more about it! But what is data science, and what do you need to know if you want to work in this field? One of the most important things to understand about data science is that it is a very hands-on and ever-changing discipline. It's critical to keep learning new things in order to stay current with the latest trends and practices in the field.

data science, data scientist, scientist, (14 more...)

#artificialintelligence

Genre:

Research Report > Strength High (0.31)
Research Report > Experimental Study (0.31)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.50)

Add feedback

A New Way of Managing Deep Learning Datasets - KDnuggets

#artificialintelligenceMar-23-2022, 18:12:43 GMT

Hub by Activeloop is an open-source Python package that arranges data in Numpy-like arrays. It integrated smoothly with deep learning frameworks such as Tensorflow and PyTorch for faster GPU processing and training. We can update the data, visualize the data, and create machine learning pipelines using Hub API. Hub allows us to store images, audio, video, and time-series data in a way that can be accessed at lightning speed. The data can be stored on GCS/S3 buckets, local storage, or on Activeloop cloud.

activeloop cloud, dataset, tensor, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Identify, version control, and document the best performing model during training

#artificialintelligenceJan-5-2022, 13:40:08 GMT

Model training can be seen as the generation of subsequent versions of a model -- after each batch, the model weights are adjusted, and as a result, a new version of the model is created. Each new version will have varying levels of performance (as evaluated against a validation set). If everything goes well, training and validation loss will decrease with the number of training epochs. However, the best performing version of a model (here abbreviated as best model) is rarely the one obtained at the end of the training process. Take a typical overfitting case -- at first, both training and validation losses decrease as training progresses.

best model, callback, validation loss, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Top Python libraries of 2021 you should know about

#artificialintelligenceDec-28-2021, 22:05:08 GMT

Welcome to a new edition (7th!) of our yearly Top Python Libraries list! Starting in December 2015 -- and uninterruptedly since then -- we have been compiling the best Python libraries that are launched or popularized every year (or late the previous year). It all started as a "Top 10" series, but although we still have 10 main picks, we are nowadays listing so many more libraries. The work the Python community has been doing is just too good, and we want to give YOU a chance to find these great libraries in case they haven't yet crossed your path. In case you are not a fan of most top-10-style posts, bear with us and give this a chance.

augmentation, library, python library, (14 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Machine Learning in the Browser

#artificialintelligenceDec-24-2021, 16:50:16 GMT

Google Colaboratory, often referred to as colab, is a product created by Google to allow anyone to create and run python code in the browser. It has many standard machine and data science libraries built-in including pandas and scikit-learn. You can also install practically any other python library for use in each notebook. To access colab you need to sign up for a Google account and this then gives you free access to the notebook environment and computing resources that include GPU's. Let's walk through a quick demo.

google colab notebook, kernel, machine learning, (7 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (0.61)
Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

YMIR: A Rapid Data-centric Development Platform for Vision Applications

Huang, Phoenix X., Hu, Wenze, Brendel, William, Chandraker, Manmohan, Li, Li-Jia, Wang, Xiaoyu

arXiv.org Artificial IntelligenceNov-27-2021

This paper introduces an open source platform to support the rapid development of computer vision applications at scale. The platform puts the efficient data development at the center of the machine learning development process, integrates active learning methods, data and model version control, and uses concepts such as projects to enable fast iterations of multiple task specific datasets in parallel. This platform abstracts the development process into core states and operations, and integrates third party tools via open APIs as implementations of the operations. This open design reduces the development cost and adoption cost for ML teams with existing tools. At the same time, the platform supports recording project development histories, through which successful projects can be shared to further boost model production efficiency on similar tasks. The platform is open source and is already used internally to meet the increasing demand for different real world computer vision applications.

annotation, dataset, platform, (15 more...)

arXiv.org Artificial Intelligence

2111.10046

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GitHub - replicate/keepsake: Version control for machine learning

#artificialintelligenceOct-9-2021, 02:00:48 GMT

Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Google Cloud Storage. You can get the data back out using the command-line interface or a notebook. Then Keepsake will start tracking everything: code, hyperparameters, training data, weights, metrics, Python dependencies, and so on. Your experiments are all in one place, with filter and sort. Because the data's stored on S3, you can even see experiments that were run on other machines.

keepsake, replicate keepsake, version control, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.97)
Information Technology > Cloud Computing (0.96)
Information Technology > Software (0.59)

Add feedback